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Abstract 

In this paper we consider a class of dynamic vehicle routing problems, 
in which a number of mobile agents in the plane must visit target points 
generated over time by a stochastic process. It is desired to design motion 
coordination strategies in order to minimize the expected time between 
the appearance of a target point and the time it is visited by one of the 
agents. We propose control strategies that, while making minimal or no 
assumptions on communications between agents, provide the same level 
of steady-state performance achieved by the best known decentralized 
strategies. In other words, we demonstrate that inter-agent communica- 
tion does not improve the efficiency of such systems, but merely affects 
the rate of convergence to the steady state. Furthermore, the proposed 
strategies do not rely on the knowledge of the details of the underlying 
stochastic process. Finally, we show that our proposed strategies provide 
an efficient, pure Nash equilibrium in a game theoretic formulation of the 
problem, in which each agent's objective is to maximize the number of 
targets it visits. Simulation results are presented and discussed. 



1 Introduction 

A very active research area today addresses coordination of several mobile 
agents: groups of autonomous robots and large-scale mobile networks are be- 
ing considered for a broad class of applications, ranging from environmental 
monitoring, to search and rescue operations, and national security. 

An area of particular interest is concerned with the generation of efficient co- 
operative strategies for several mobile agents to move through a certain number 
of given target points, possibly avoiding obstacles or threats [1-5]. Trajectory 
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efficiency in these cases is understood in terms of cost for the agents: in other 
words, efficient trajectories minimize the total path length, the time needed 
to complete the task, or the fuel/energy expenditure. A related problem has 
been investigated as the Weapon- Target Assignment (WTA) problem, in which 
mobile agents are allowed to team up in order to enhance the probability of a 
favorable outcome in a target engagement [6,7]. In this setup, targets locations 
are known and an assignment strategy is sought that maximizes the global suc- 
cess rate. In a biological setting, the closest parallel to many of these problems 
is the development of foraging strategies, and of territorial vs. gregarious behav- 
iors [8], in which individuals choose to identify and possibly defend a hunting 
ground. 

In this paper we consider a class of cooperative motion coordination prob- 
lems, to which we can refer as dynamic vehicle routing, in which service requests 
are not known a priori, but are dynamically generated over time by a stochastic 
process in a geographic region of interest. Each service request is associated to 
a target point in the plane, and is fulfilled when one of a team of mobile agents 
visits that point. For example, service requests can be thought of as threats 
to be investigated in a surveillance application, events to be measured in an 
environmental monitoring scenario, and as information packets to be picked up 
and delivered to a user in a wireless sensor network. It is desired to design 
a control strategy for the mobile agents that provably minimizes the expected 
waiting time between the issuance of a service request and its fulfillment. In 
other words, our focus is on the quality of service as perceived by the "end user," 
rather than, for example, fuel economies achieved by the mobile agents. Sim- 
ilar problems were also considered in [9,10], and decentralized strategies were 
presented in [11]. This problem has connections to the Persistent Area Denial 
(PAD) and area coverage problems discussed, e.g., in [3,12-14]. 

A common theme in cooperative control is the investigation of the effects 
of different communication and information sharing protocols on the system 
performance. Clearly, the ability to access more information at each single 
agent can not decrease the performance level; hence, it is commonly believed 
that by providing better communication among agents will improve the system's 
performance. In this paper, we prove that there are certain dynamic vehicle 
routing problems which can, in fact, be solved (almost) optimally without any 
explicit communication between agents; in other words, the no-communication 
constraint in such cases is not binding, and does not limit the steady-state 
performance. The main contribution of this paper is the introduction of a motion 
coordination strategy that does not require any explicit communication between 
agents, while achieving provably optimal performance in certain conditions. 

The paper is structured as follows: in Section [21 we set up and formulate 
the problem we investigate in the paper. In Section [3] we introduce the pro- 
posed solution algorithms, and discuss their characteristics. Section 0] is the 
technical core of the paper, in which we prove the convergence of the perfor- 
mance provided by the proposed algorithms to a critical point (either a local 
minimum or a saddle point) of the global performance function. Moreover, we 
show that any optimal configuration corresponds to a class of tessellations of 



the plane that we call Median Voronoi Tessellations. Section [S] is devoted to 
a game-theoretic interpretation of our result in which the agents are modeled 
as rational autonomous decision makers trying to maximize their own utility 
function. We prove that, following the policy prescribed by our algorithm, the 
agents reach an efficient pure Nash equilibrium which can be anyway subopti- 
mal with respect to the global utility function (in this case the expected time 
of service). In Section [B] we present some numerical results, while Section [7| is 
dedicated to final remarks and further extensions of this line of research. 

2 Problem Formulation 

Let ft C R 2 be a convex domain on the plane, with non-empty interior; we will 
refer to ft as the workspace. A stochastic process generates service requests over 
time, which are associated to points in ft; these points are also called targets. 
The process generating service requests is modeled as a spatio-temporal Poisson 
point process, with temporal intensity A > 0, and an absolutely continuous 
spatial distribution described by the density function p : ft — > R + , with bounded 
and convex support within ft (i.e., tp(q) >0og£QCl] 1 with Q bounded 
and convex). The spatial density function (p is normalized in such a way that 
Jo tfio) dq = 1. Both A and ip are not necessarily known. 

A spatio-temporal Poisson point process is a collection of functions {P : 
K + — > 2 n } such that, for any t > 0, V(t) is a random collection of points in ft, 
representing the service requests generated in the time interval [0,t), and such 
that 

• The total numbers of events generated in two disjoint time-space regions 
are independent random variables; 

• The total number of events occurring in an interval [s, s+t) in a measurable 
set S C ft satisfies 

Pr [card ((V(s + t) - V{s)) nS) = k]= ^-Xt ■ p(S))(Xt ■ p(S))^ 

k\ 

where this must holds for any k in N and where <p{S) is a shorthand for 
Is ¥>(<!) dq. 

Each particular function V is a realization, or trajectory, of the Poisson point 
process. A consequence of the properties defining Poisson processes is that the 
expected number of targets generated in a measurable region S C ft during a 
time interval of length At is given by: 

E[card ({T(t + At) - T{t)) n S)] = XAt ■ tp(S). 

Without loss of generality, we will identify service requests with targets points, 
and label them in order of generation; in other words, given two targets e^, ej € 
V(t), with i < j, the service request associated with these target have been 



issued at times U <tj<t (since events are almost never generated concurrently, 
the inequalities are in fact strict almost surely). 

A service request is fulfilled when one of to mobile agents, modeled as point 
masses, moves to the target point associated with it; to is a possibly large, but 
finite number. Let p(t) = (p 1 (t),p2{t), . . . ,p m (t)) £ fi m be a vector describing 
the positions of the agents at time t. (We will tacitly use a similar notation 
throughout the paper). The agents are free to move, with bounded speed, 
within the workspace f2; without loss of generality, we will assume that the 
maximum speed is unitary. In other words, the dynamics of the agents are 
described by differential equations of the form 

^M= Ui (t), with ||ui(t)|| < 1, Vi>0,*e{l,...,m}. (1) 
at 

The agents are identical, and have unlimited range and target-servicing capa- 
bility. 

Let Bi(t) C O indicate the set of targets serviced by the i-th agent up to time 
t. (By convention, B»(0) = 0, i = 1, . . . , to). Wc will assume that Bi n B 3 ■ = 
if i ^ j, i.e., that service requests are fulfilled by at most one agent. (In the 
unlikely event that two or more agents visit a target at the same time, the target 
is arbitrarily assigned to one of them). 

Let V : t — > 2 n indicate (a realization of) the stochastic process obtained 
combining the service request generation process V and the removal process 
caused by the agents servicing outstanding requests; in other words, 

V(t) = V{t) UBi(t) U . . . U B m (t), V{t) n B t (t) = 0, Vi £ {1, . . . , to}. 

The random set T>[t) C £1 represents the demand, i.e., the service requests 
outstanding at time t; let n(t) = card(2?(f)). 

Our objective in this paper will be the design of motion coordination strate- 
gies that allow the mobile agents to fulfill service requests efficiently (we will 
make this more precise in the following). In particular, in this paper we will 
concentrate on motion coordination strategies of the following two forms: 

7Ti : (p u B h V) h-> u i: i £ {1, ...,m}, (2) 

and 

TTi : (p 1 ,...,p m ,B i ,T>) h+Ui, i € {l,...,m}. (3) 

An agent executing a control policy of the form @ relies on the knowledge of 
its own current position, on a record of targets it has previously visited, and 
on the current demand. In other words, such control policies do not need any 
explicit information exchange between agents; as such, we will refer to them as 
no communication (nc) policies. Such policies are trivially decentralized. 

On the other hand, an agent executing a control policy of the form @ can 
sense the current position of other agents, but still has information only on the 
targets itself visited in the past (i.e., does not know what, if any, targets have 
been visited by other agents). We call these sensor-based (sb) policies, to signify 



the fact that only factual information is exchanged between agents — as opposed 
to information related to intent and past history. Note that both families of 
coordination policies rely, in principle, on the knowledge of the locations of all 
outstanding targets. (However, as we will see in the following, only local target 
sensing will be necessary in practice). 

A policy 7r = (7Ti, 7T2, . . . , 7r m ) is said to be stabilizing if, under its effect, the 
expected number of outstanding targets does not diverge over time, i.e., if 

n n = lhnE[n(t)\\pi(t)=n(p(t),Bi{t),V(t)),i£{l,...,m}]<oQ. (4) 

t — >oo 

Intuitively, a policy is stabilizing if the mobile agents are able to visit targets 
at a rate that is — on average — at least as fast as the rate at which new service 
requests are generated. 

Let Tj be the time elapsed between the issuance of the j-th service request, 
and the time it is fulfilled. If the system is stable, then the following balance 
equation (also known as Little's formula [15]) holds: 

«tt = AT^, (5) 

where := lim^oo E[Tj] is the system time under policy tt, i.e., the expected 
time a service request must wait before being fulfilled, given that the mobile 
agents follow the strategy defined by tt. Note that the system time TV can be 
thought of as a measure of the quality of service, as perceived by the "user" 
issuing the service requests. 

At this point we can finally state our problem: we wish to devise a policy 
that is (i) stabilizing, and (ii) yields a quality of service (i.e., system time) 
achieving, or approximating, the theoretical optimal performance given by 

T opt = inf (6) 

7r stabilizing 

Centralized and decentralized strategies are known that optimize or approx- 
imate (0 in a variety of cases of interest [10, 11, 16, 17]. However, all such 
strategies rely either on a central authority with the ability to communicate to 
all agents, or on the exchange of certain information about each agent's strategy 
with other neighboring agents. In addition, these policies require the knowledge 
of the spatial distribution (p; decentralized versions of these implement versions 
of Lloyd's algorithm for vector quantization [18]. 

In the remainder of this paper, we will investigate how the additional con- 
straints posed on the exchange of information between agents by the models (|5J 
and J2J impact the achievable performance and quality of service. Remarkably, 
the policies we will present do not rely on the knowledge of the spatial distribu- 
tion ip, and are a generalized version of MacQueen's clustering algorithm [19]. 

3 Control policy description 

In this section, we introduce two control policies of the forms, respectively, (|2J 
and ©. An illustration of the two policies is given in Figure ^ 



Figure 1 : An illustration of the two control policies proposed in Section 
While no targets are outstanding, vehicles wait at the point that minimizes 
the average distance to targets they have visited in the past; such points are 
depicted as squares, while targets are circles and vehicles triangles. In the no- 
communication policy, at the appearance of a new target, all vehicles pursue it 
(left). In the sensor-based policy, only the vehicle that is closest to the target 
will pursue it (right). 



3.1 A control policy requiring no explicit communication 

Let us begin with an informal description of a policy 7r nc requiring no explicit 
information exchange between agents. At any given time t, each agent computes 
its own control input according to the following rule: 

1. If T>(t) is not empty, move towards the nearest outstanding target. 

2. If T>(t) is empty, move towards the point minimizing the average distance 
to targets serviced in the past by each agent. If there is no unique mini- 
mizcr, then move to the nearest one. 

In other words, we set 

n nc (pi(t),Bi(t),V(t)) = vers(F nc ( Pl (t),B l (t),V(t)) - Pi (t)), (7) 

where 

argmin ||j>j — if D ^ 0, 

<?ex? 

F nc {PuB u V) = { . ^ (8) 

argmin y ||e— q\\, otherwise, 



is the Euclidean norm, and 
vers(u) = 



ess, 



v/\\v\\, if v t^O, 
otherwise. 



The convex function W : q i— » J2 e eB II? ~ e ll> °ften called the (discrete) Weber 
function in the facility location literature [20, 21] (modulo normalization by 



card(B)), is not strictly convex only when the point set B is empty — in which 
case we set W(-) = by convention — or contains an even number of collincar 
points. In such cases, the minimizer nearest to p\ in (SJ is chosen. We will call 
the point p*(t) = F nc (-, Bi(t), 0) the reference point for the i-th agent at time t. 

In the 7r nc policy whenever one or more service requests are outstanding, all 
agents will be pursuing a target; in particular, when only one service request is 
outstanding, all agents will move towards it. When the demand queue is empty, 
agents will either (i) stop at the current location, if they have visited no targets 
yet, or (ii) move to their reference point, as determined by the set of targets 
previously visited. 



3.2 A sensor-based control policy 

The control strategy in the previous section can be modified to include in- 
formation on the current position of other agents, if available (e.g., through 
on-board sensors). In order to present the new policy, indicate with V(p) = 
{Vi(p), V-iip), . . . , V m {p)} the Voronoi partition of the workspace ft, defined as: 

V i ( P )={gGn:||g-p i || < \\q - Pj \\,Vj = 1 . . .m}. (9) 

As long as an agent has never visited any target, i.e., as long as Bi(t) = 0, 
it executes the n nc policy. Once an agent has visited at least one target, it 
computes its own control input according to the following rule: 

1. If T>(t) n Vi (t) are not empty, move towards the nearest outstanding target 
in the agent's own Voronoi region. 

2. If T>{t) n Vi(t) is empty, move towards the point minimizing the average 
distance to targets in Bi(t). If there is no unique minimizer, then move to 
the nearest one. 

In other words, we set 

7r Bb (p(t),B^(t),2?(t)) = vcYs(F sh (p(t),B % (t),V(t)) - Pl (t)), (10) 

where 



F sh {p,Bi,V) 



arg min \\p. t - q\\ , if V n V; ^ 0, and B t = 

arg min lip; - q\\ , if V n Vi + 0, and B, ^ 

96X>nVi(p) (11) 



arg min |je — q\\, otherwise. 



In the 7r s b policy, at most one agent will be pursuing a given target, at any 
time after an initial transient that terminates when all agents have visited at 
least one target each. The agents' behavior when no oustanding targets are 
available in their Voronoi region is similar to that determined by the n nc policy 
previously discussed, i.e., they move to their reference point, determined by 
previously visited targets. 



Remark 1 While we introduced Voronoi partitions in the definition of the con- 
trol policy, the explicit computation of each agent 's Voronoi region is not neces- 
sary. In fact, each agent only needs to check whether it is the closest agent to a 
given target or not. In order to check whether a target point q is in the Voronoi 
region of the i-th agent, it is necessary to know the current position only of 
agents within a circle or radius \\pi — q\\ centered at q (see Figure^). For exam- 
ple, if such circle is empty, then q is certainly in V-s ; if the circle is not empty, 
distances of the agents within it to the target must be compared. This provides 
a degree of spatial decentralization — with respect to other agents — that is even 
stronger than that provided by restricting communications to agents sharing a 
boundary in a Voronoi partition (i.e., neighboring agents in the Delaunay graph, 
dual to the partition ©j. 
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Figure 2: Implicit computation of Voronoi regions: Even though target ei is the 
nearest target to pi, it is not in the Voronoi region of the 1st agent. In fact, the 
circle of radius ||ei — JJi|| centered at ei contains P2, and the 2nd agent is closer 
to e\. However, the circle of radius |je2 — pi centered at ei does not contain any 
other agent, ensuring that ei is in the Voronoi region generated by p\. 

Remark 2 The sensor-based policy is more efficient than the no-communication 
policy in terms of the length of the path traveled by each agent, since there is 
no duplication of effort as several agents pursue the same target. However, in 
terms of "quality of service, " we will show that there is no difference between 
the two policies, for low target generation rates. Numerical results show that 
the sensor-based policy is more efficient in a broader range of target generation 
rates, and in fact provides almost optimal performance both in light and heavy 
load conditions. 

4 Performance analysis in light load 

In this section we analyze the performance of the control policies proposed 
in the previous section. In particular, we concentrate our investigation on the 
light load case, in which the target generation rate is very small, i.e., as A — > + . 
This will allow us to prove analytically certain interesting and perhaps surprising 



characteristics of the proposed policies. The performance analysis in the general 
case is more difficult; we will discuss the results of numerical investigation in 
Section [fJl but no analytical results are available at this time. 

4.1 Overview of the system behavior in the light load 



Before starting a formal analysis, let us summarize the key characteristics of the 
agents' behavior in light load, i.e., for small values of A. 

1. At the initial time the m agents are assumed to be deployed in general 
position in fi, and the demand queue is empty, T>(0) = 0. 

2. The agents do not move until the first service request appears. At that 
time, if the policy 7r nc is used, all agents will start moving towards the 
first target. If the sensor-based policy 7r s b is used, only the closest agent 
will move towards the target. 

3. As soon as one agent reaches the target, all agents start moving towards 
their current reference point, and the process continues. 

For small A, with high probability (i.e., with probability approaching 1 as 
A — > 0) at most one service request is outstanding at any given time. In other 
words, new service requests are generated so rarely that most of the time agents 
will be able to reach a target and return to their reference point before a new 
service request is issued. 

Consider the j-th service request, generated at time tj. Assuming that at tj 
all agents are at their reference position, the expected system time Tj can be 
computed as 



Note that p* is a random variable, the value of which depends in general on the 
particular realization of the target generation process. If all service requests arc 
generated with the agents at their reference position, the average service time 
(for small A) can be evaluated as 



Since the system time depends on the random variable p* = (p*, . . . ,p^), it is 
itself a random variable. The function appearing on the right hand side of the 
above equation, relating the system time to the asymptotic location of reference 
points, is called the continuous multi- median function [20]. This function admits 



regime 




Assume for now that the sequences {p* (tj ) : j € N} converge, and let 



lim p*(t 3 ) =p$. 

0— >°o 




(12) 



a global minimum (in general not unique) for all non-singular density functions 
tp, and in fact it is known [10] that the optimal performance in terms of system 
time is given by 




(13) 



In the following, we will investigate the convergence of the reference points as 
new targets are generated, in order to draw conclusions about the average system 
time T in light load. In particular, we will prove not only that the reference 
points converge with high probability (as A — > 0) to a local critical point (more 
precisely, cither local minima or saddle points) for the average system time, 
but also that the limiting reference points p* are generalized medians of their 
respective Voronoi regions, where 

Definition 3 (Generalized median) The generalized median of a set S C 
R ra with respect to a density function ip : S — > K_|_ is defined as 



We call the resulting Voronoi tessellation Median Voronoi Tessellation (MVT 
for short), in analogy with what is done with Centroidal Voronoi Tessellations. 
A formal definition is as follows: 

Definition 4 (Median Voronoi Tessellation) A Voronoi tessellation V(p) = 
{Vi(p), . . . , V m (p)} of a set S C M. n is said a Median Voronoi Tessellation of S 
with respect to the density function ip if the ordered set of generators p is equal 
to the ordered set of generalized medians of the sets in V{p) with respect to ip, 
i.e., if 



Since the proof builds on a number of intermediate results, we provide an 
outline of the argument as a convenience to the reader. 

1. First we prove that the reference point of any agents that visits an un- 
bounded number of targets over time converges almost surely. 

2. Second, we prove that, if m > 1 agents visit an unbounded number of 
targets over time, their reference points will converge to the generators of 
a MVT almost surely, as long as agents are able to return to their reference 
point infinitely often. 

3. Third, we prove that all agents will visit an unbounded number of targets 
(this corresponds to a property of distributed algorithms that is often 
called fairness in computer science). 



p := arg mm / 






Vi e {1, . . . , m}. 



4. Finally, we prove that agents are able to return to their reference point 
infinitely often with high probability as A — > + . 



Combining these steps, together with |jSJ, will allow us to state that the reference 
points converge to a local critical point of the system time, with high probability 
as 0+. 

4.2 Convergence of reference points 

Let us consider an agent i, such that 



i.e., an agent that services an unbounded number of requests over time. Since 
the number of agents m is finite, and the expected number of targets generated 
over a time interval [0, t) is proportional to t, at least one such agent will always 
exist. In the remainder of this section, we will drop the subscript i, since we 
will consider only this agent, effectively ignoring all others for the time being. 

For any finite t, the set B(t) will contain a finite number of points. As- 
suming that B(t) contains at least three non-collincar points, the discrete We- 
ber function p i— > X^eB(t) Wp ~ sll ^ s strictly convex, and has a unique opti- 
mizer p*(t) = argmin p6 o SqeB(t) \\P~ ^ ne optimal point p*(t) is called the 
Fermat-Torricelli (FT) point — or the Weber point in the location optimization 
literature — associated with the set B(t); see [21-23] for a historical review of 
the problem and for solution algorithms. 

It is known that the FT point is unique and algebraic for any set of non- 
collinear points. While there arc no general analytic solutions for the location 
of the FT point associated to more than 4 points, numerical solutions can be 
easily constructed relying on the convexity of the Weber function, and on the 
fact that is is differentiable for all points not in B. Polynomial-time approxi- 
mation algorithms are also available (see, e.g., [24,25]). Remarkably, a simple 
mechanical device can be constructed to solve the problem, based on the so- 
called Varignon frame, as follows. Holes are drilled on a horizontal board, at 
locations corresponding to the points in B. A string attached to a unit mass is 
passed through each of these holes, and all strings are tied together at one end. 
The point reached by the knot at equilibrium is a FT point for B. 

Some useful properties of FT points are summarized below. If there is a 
qa G B is such that 



qeB\qo 

then p* = go is a FT point for B. If no point in B satisfies such condition, then 
the FT point p* can be found as a solution of the following equation: 



lim card(#i(t)) 



= oo, 




(14) 




(15) 



In other words, p* is simply the point in the plane at which the sum of unit 
vectors starting from it and directed to each of the points in B is equal to zero; 



this point is unique if B contains non-collinear points. Clearly, the FT point is 
in the convex hull of B. 

Note that the points in B(t) are randomly sampled from an unknown abso- 
lutely continuous distribution, described by a spatial density function if — which 
is not necessarily the same as ip, and in general is time- varying, depending on 
the past actions of all agents in the system. Even though <p is not known, it can 
be expressed as 



<p(q) 







if q G T{t) 
otherwise, 



for some convex set T(t) containing p(t). (In practical terms, such set will be 
the Voronoi region generated by p(t)). 

The function t i— > p*(t) is pieccwisc constant, i.e., it changes value at the 
discrete time instants {tj : j G N} at which the agent visits new targets. As a 
consequence, we can concentrate on the sequence {p*{tj) : j G N}, and study 
its convergence. 

Definition 5 For any t > 0, let the solution set C(t) be defined as 



C(t) := { p G Q 



^2 vers(p - q] 
geB(t) 



< 1 




0.1 0.2 0.3 



0.5 0.6 



0.8 0.9 



Figure 3: Example of a Fermat-Torricclli point (star) and solution set corre- 
sponding to five target points (circles). Upon the addition of an arbitrarily 
chosen sixth target point, the Fermat-Torricelli is guaranteed to remain within 
the region bounded by the curve. 

An example of such set is shown in Figure The reason for introducing such 
solution sets is that they have quite remarkable properties as shown by the 
following 



Proposition 6 For any j € N, p*(tj + i) <G C(tj). More specifically, if e J+ i ^ 
C(tj) (i.e., the target point associated to the j-th service request is outside the 
solution set) then the FT point p* (ij+i) is on the boundary of C{. If e 3+ i € 
C(tj), then p*(t j+1 ) = e J+1 . 

Proof: If ej + i lies outside C(tj), we search for p*(tj + i) as the solution of 
the equation 

^2 vers(p - q) + vers(p - e j+1 ) = 0, 
from which it turns out immediately 



^2 vcrs(p - q) 



||-vers(p- e i+ i)|| = 1, 



thus p*(tj + i) £ dC(tj). Notice that is is not true in general that the solution 
p*(tj + i) will lie on the line connecting p*(tj) with the new target e^+i. In the 
other case, if e^+i lies in C(tj), then it satisfies condition 1)14)1 . and is the new 
FT point. ■ 
Now, in order to prove that the {p*(^j)}jeN converges to a point p* , we will 
prove that the diameter of the solution set C(tj) vanishes almost surely as j 
tends to infinity. First we need the following result. 

Proposition 7 If Q = Supp(</>) is convex with non-empty interior, thenp*(t) £ 
int(Q) almost surely, for all t such that B(t) ^ . 

Proof: For any non-empty target set B(t), the FT point lies within the 
convex hull of B(t). All points in the set B(t) are contained within the interior 
of Q with probability one, since the boundary of Q is a set of measure zero. Since 
int(Q) is convex, and B(t) C int(Q) almost surely, p*(t) G co(B(t)) C int(Q), 
almost surely. ■ 

Proposition 8 // the support of ip is convex and bounded, 

lim diam(C (tj)) — 0, a.s. 

Proof: Consider a generic point p £ C(tj), and let S = p — p*(tj), 
a q = arccos [vers(p — p*{tj)) ■ vers((j — p*{tj))] 



a' q = arccos [vers(p — p*(tj)) ■ vers(q — p)] , 

see Figure 0] 



Figure 4: Geometric constructions in the proof of Proposition |SJ 



Since p G C(tj), the magnitude of the sum of unit vectors Y^ q eB(t ) vcrs (p — l) 
is no more than one, and the following inequality is true: 



^2 vers(p-g) J •vers(p-p*(i i )) 
K qeB{t } ) 



(vcrs(p - q) ■ veis(p - p*(tj))) 



cos K' 

qdB(tj) 



< 1. (16) 



Using elementary planar geometry, we obtain that 

8 sin(a g ) 



^ - & q > sin(a g - a q ) = 



\Q-P\ 



Pick a small angle < a m i n < 7r/2, and let 

Ba mi „{tj) = {q e B(tj) : sin(a 9 ) > sin(a min )}. 

(In other words, S Qmin (i) contains all points in B that are not in a conical region 
of half-width a m i n , as shown in Figurc^J. For all q g B(tj), cos(a g ) < cos(a g ); 
moreover, for all q € S amin , 

cos(a' g ) < cos(a g ) - sin(amin) (e^ - a g ) < 

(Ssin(a min ) 2 c5sin(a mi „) 2 
< cos(a„) — < cos(a„) 

Wq-pW 

Hence, summing over all q £ B{tj), we get: 



C0S K) ^ C0S K) - H 



diam(Q) +6 
<5sin(a m i n ) 2 



geBfe) «eB(*i) 
Observe now that in any case 



diam(Q) + S 



(17) 



]T cos(a g ) 
geB(i 3 ) 



< 1, 



(it is zero in case p*{tj) B(tj), and bounded in absolute value by one if 
p*(tj) £ B(tj)). Therefore, rearranging equation lfT7|l: 

card(B am Jt 3 0) d ^7 + )2 , < E C0S K)- E cos«)<2 

Solving this inequality with respect to 5 we get: 

2diam(g) 
" card(6 Qmm (t 3 ))sin(a n • ^ 



Since (i) a min is a positive constant, (ii) p*(tj) is in the interior of Q, and 
(hi) Hindoo card(Z?(tj)) = +00, the right hand side of (|18|l converges to zero 
with probability one. Since the bound holds for all points p E C(tj), for all 
j € N, the claim follows. ■ 

In the previous proposition, we have proven that ||p*(tj+i) — p*(£j)|| tends 
to zero a.s. as j — > 00, under some natural assumptions on the distribution ip 
and its support. Unfortunately this is not sufficient to prove that the sequence 
{p*(tj)}t j £N is Cauchy; convergence of the sequence is however ensured by the 
following 

Proposition 9 The sequence {p*(tj)}j^jq converges almost surely. 

Proof: Since the sequence {p*(tj)}j&N takes value in a compact set (the 
closure of Supp(<p)), by the Bolzano- Weirstrass theorem there exists a subse- 
quence converging to a limit point p* in the compact set. Construct from 
{p*(tj)}j£fj a maximal subsequence converging to p*, and call J the set of in- 
dices of this maximal subsequence. If the original sequence {p*{tj)}jeN is not 
converging to p*, then there exists an L > such that — P*\\ > L, for 

any j G N \ J, and this set of indices is unbounded. Take e := L/3 > 0. We 
have that \\p*(tk-i) — p*\\ < e for any sufficiently large [k — 1) € J; moreover, 
\\p*{tk-i) — p*{tk)\\ < e, a.s. by Proposition (JSJ. Choose a sufficiently large 
(k — 1) € J, such that k G N — J (this is always possible since the complemen- 
tary set of J is unbounded by the assumption of non-convergence). But now, 
we have 

l < \\p*(t k )-p*\\ < \\ P *(t k ) -^(4-OH + lb*(t fc -i) -P*\\ < \l, 

which is a contradiction. ■ 



4.3 Convergence to the generalized median 

By the discussion in the previous section, we know that the reference points of 
all agents that visit an unbounded number of targets converge to a well-defined 
limit. So do, trivially, the reference points of all agents that visit a bounded 
set of targets. Hence, we know that the sequence of reference points p*(tj) 
converges to a limit p* , almost surely for all i £ {1 , . . . , m} . Let use denote by 
Vi(p*(tj)) the Voronoi region associated to the generator p*(tj), and by V,(p*) 
the Voronoi region corresponding to the limit point p* . 



Proposition 10 If the limit reference points p* = (p*,-.-,^) are distinct, 
then the sequence of Voronoi partitions {V(p*(tj))}j^ converges to the Voronoi 
partition generated by the limit of reference points, i.e., 

lim Vi(p*fo)) = Vi(p*), a.s. 

Proof: The boundaries of regions in a Voronoi partition are algebraic curves 
that depend continuously on the generators, as long as these are distinct. Hence, 
under this assumption, almost sure convergence of the generators implies the 
almost sure convergence of the Voronoi regions. ■ 
As a next step, we wish to understand what is the relation between the 
asymptotic reference positions and their associated Voronoi regions. More pre- 
cisely, let A C {1, ... , to} be the subset of indices of agents that visit an un- 
bounded number of targets; we want to prove that p* is indeed the generalized 
median p i associated to agent i, with respect to the limiting set Vj(p*) and 
distribution <p(x), Vi G A. First we need the following technical result. 

Lemma 11 Let {fi}ien : Q — > K be a sequence of strictly convex continuous 
functions, defined on a common compact subset Q C K" . Assume that each fi 
has a unique Xi := argmin^ fi belonging to the interior of SI for any i and that 
this sequence of function converges uniformly to a continuous strictly convex 
function f admitting a unique minimum point x belonging also to the interior 
of fi. Then lim,— ,oo Xi = x. 

Proof: Since {/i}i 6 N converges uniformly to /, then for any e > 0, there 
exists an 1(e) such that for any i > /(e), — /|| < e uniformly in x G fi. Let 
m := f(x), the minimum value achieved by /. Consider the set 

U e := {x G fi such that f(x) < to + 2e}. 

Since / is strictly convex, for e sufficiently small, U e will be a compact subset 
contained in f2. Moreover, since / is strictly convex, we have that U e is strictly 
included in U e i, whenever e' > e and both are sufficiently small. It is also clear 
that lim^oo U e = x (nested strictly decreasing sequence of compact subsets all 
containing the point x). If ||/j — /|| < e, we claim that Xi G U e ; we prove this 
by contradiction. Since fi > / — e, if Xi does not belong to U e , then min(/i) = 
fi(%i) > to + e (this is just because U e is simply the set where the function 
(/— e) < TO + e). But since fi < / + eit turns out that fi(x) <m + e< min(/i), 
which is a contradiction. ■ 
We conclude this section with the following: 

Proposition 12 Assume that all agents in A are able to return infinitely often 
to their reference point between visiting two targets. Then, the limit reference 
points of such agents coincide, almost surely, with the generalized medians of 
their limit Voronoi regions, i.e., 



p* = argmin / (p(q)dq, a.s., Vi G A. 



Proof: For any i £ A, define the functions fi ytj (p) ■= j Y^ q eBi(t ) \\P~l\\ an d 
fi(p) := Jy (j6*) \\p~ Qllfiy) dq- These functions are continuous and well defined 
over O. We restrict their domains of definition to the compact set Q = Supp(<p). 
These functions are also strictly convex and have unique minima in the interior 
of Q. Let us notice that, with our previous notation, we have that p*(tj) = 
argmin/i jtj (p) and p i = argmin/i(p). Observe that the functions fi^ip) and 
fi(p) can be considered random variables with respect to a probability space 
whose space of events coincide with all possible realizations of target sequences. 
Consider a restriction of these random variable to a new probability space whose 
space of events coincide with all possible realizations of target sequences, for 
which the corresponding FT points converge to a limiting point. On this new 
probability space the random variable fi{p) becomes a deterministic function 
which is the expected value of the random variables /j^. (p). Since Q is compact, 
it is immediate to see that fcj. (p) have finite expectation and variance over 
this reduced probability space, and by the Strong Law of Large Numbers we 
can conclude that almost surely (over this reduced probability space) /j^ (p) 
converge pointwise to fi(p). To show that fi,tj(p) converges pointwisc to fi(p) 
over the original probability space, it is sufficient to observe the following. The 
original probability space is the probability space whose space of events coincide 
with all possible realizations of target sequences. We already know that almost 
surely the FT points associated to any possible realization of target sequences 
will converge. So we can fiber the space of events of the first probability space 
into spaces of events of reduced probability spaces, except for a set of measure 
zero. This is sufficient to prove that fi,t {p) converge pointwise to fi{p) almost 
surely with respect to all possible realizations of target sequences. 

Now that we have proved that almost surely the sequence {/i,^ (p)}t,-eR 
converges pointwisc to fi(p), we prove that it does converge uniformly. To 
do this, we use a theorem, usually attributed to Dini-Arzcla' which state the 
following: an cquicontinuous sequence of functions converges uniformly to a 
continuous function on a compact set Q if and only if it converges poitwisc 
to a continuous function on the same compact set. Our sequence fi,tj{p) is 
equicontinuous if Ve > and Vp £ Q there exists a S > such that for all j € 
{1, . . . , n} and for all p' e Q with \\p'-p\\ < 5, we have ||/ ijtj 0) - fi, tj (p')ll < ^ 
observe that S is independent on j , while in general it will depend on e and on 
p. Now we have 

WktM-kW)\\<-, E llb-9ll-lb'-«lll- 

Using 

h-p'\\ = h-p + p-p'\\ < \\q-p\\ + Ib-p'll 

and 

h-p\\ = h-p' +p' -p\\ < h-p'\\ + \\p-P% 



it is immediate to see that 



||/^(p)-/i, i >')ll< 7 E \\p-p'\\<\\p-p% 



So it is sufficient to take S = e in the previous definition and S does not depend on 
j. So the sequence is equicontinuous and the poitwise convergence is upgraded 
to uniform convergence. 

We already know that almost surely the points p*(tj) do converge to points p* 
(Proposition therefore we can claim that p* =Pi, simply applying Lemma 
which requires the uniform convergence. Thus we can claim that the 
reference position of each agent which services infinitely many targets following 
our algorithm converges to the generalized median of its Voronoi region, almost 
surely. ■ 

4.4 Fairness 

In this section, we prove that, as long as ip is strictly positive over a convex set, 
both policies introduced in Section |2| are fair. 

Proposition 13 (Fairness) If Q = Supp(<^) is convex, all agents eventually 
visit an unbounded number of targets, almost surely, i.e., 



Proof: Under either policy, each agent will pursue and is guaranteed to 
service the nearest target in its own Voronoi region. Hence, in order to show 
that an agent services an unbounded number of targets, it is sufficient to show 
that the probability that the next target be generated within its Voronoi region 
remains strictly positive, i.e., that 



Since Q is convex, and all agents move towards the nearest target, at least 
initially, all agents will eventually enter Q, and remain within it. Let us denote 
with Pij the probability that agent i visits the j-th target. For any i this 
probability is always strictly positive. Indeed, even if it happens that some 
agents are servicing simultaneously the same target (simply because the service 
request appears at the boundary of two different Voronoi regions) , this does not 
mean that their reference points have to coincide. For the reference points of two 
agents eventually to coincide, it must happen that they are servicing infinitely 
many often and simultaneously the same target. Since the boundaries of Voronoi 
regions have measure zero and if is a continuous distribution without singular 
components, we can claim that lim^oo Py > almost surely. Now call P = 
niiiii=i m linij^oo Py. Then P is strictly positive almost surely. Therefore, the 
probability that the i-th agent does not visit an unbounded number of targets 
is bounded from above by liro.j_«xj ITfc=i(l — P) k = 0> almost surely. ■ 



lim card($j(i)) = +oo 



a.s., 



Vi e {!,..., m}. 




4.5 Efficiency 



In this section, we will prove that the system time provided by either one of the 
algorithms in Section [3] converges to a critical point (either a saddle point or a 
local minimum) with high probability as A — > 0. 

In the preceding sections, we have proved that — as long as each agent is able 
to return to its reference point between servicing two targets, infinitely often — 
the reference points p*(tj) converge to points p*, which generate a MVT. In 
such case, we know that the average time of service will converge to 

~ Til p 

T 7C = min \\p* - q\\ <p(q)dq = V / \\p* - q\\ tp(q)dq. (19) 
J n i=i,..., m T^Jv,(v*) 



Consider now functions H m of the form: 



n III n 

H m (pi,...,p m ) = / min \\pi - q\\ v{q)dq = V / \\p t - q\\ tp(q)dq. 

(20) 

Observe that T T belongs to the class of functions of the form Tt m where each 
point pi is constrained to be the generalized median of the corresponding Voronoi 
region (i.e., V(p) is a MVT^ 

We want to prove that TV is a critical point of 7i m . To do so, we consider 
an extension of H m , i-e. a functional /C m defined as follows: 

m ~ 

K m (pi, ■ ■ ■ ,Pm, Vi, . . . , V m ) := V / \\y-Pi\\ <fi{y)dy. 

i= i JytVi 

Observe that in this case the regions {Vi}i=i,..., TO are not restricted to form 
a MVT with respect to the generators {xi\i=i,..., m - Thus we can view the 
functional Tl m we are interested in as a constrained form of the unconstrained 
functional /C m . It turns out therefore that critical points of JC m are also critical 
points of TL m . With respect to critical points of /C m we have the following result: 

Proposition 14 Let {pj}i=i,..., m denote any set ofm points belonging to Supp((p) 
and let {Vi}i=i m denote any tessellation of Supp(</?) into m regions. More- 
over, let us define K, m as above. 

Then a sufficient condition for {pi, . . . ,p m , Vi, . . . , V m } to be a critical point 
(either a saddle point or a local minimum), is that the Vi 's are the Voronoi re- 
gions corresponding to the pi 's, and, simultaneously, the pi 's are the generalized 
median of the corresponding Vi 's. 

Proof: Consider first the variation of JC m with respect to a single point, say 
Pi. Now let v be a vector in K 2 , such that pi + ev € fi. Then we have 

fC m (pi +ev) -fC m {pi) = / {\\y - Pi - ev\\ - \\y - pi\\} ip(y)dy, 



where we have not listed the other variables on which IC m depends since they 
remain constant in this variation. By the very form of this variation, it is clear 
that if the point pi is the generalized median for the fixed region Vj, we will 
have that 1C m (pi + ev) — fC m (pi) > 0, for any v. Now consider the points 
{Pi}i=i...., m fixed and consider a tessellation {Wj}j=i,...,m different from the 
Voronoi regions {Vi}i=i,... )TO generated by the points p^s. We compare the value 
of fC m (p 1 ,...,p m ,Vi,...,V m ), with the value of K m (p\, . . . ,p m , U\, . . . , U m ). 
Consider those y which belong to the Voronoi region Vj generated by Pj, and 
possibly not to the Voronoi region of another pi. Anyway, since Ui is not a 
Voronoi tessellation, it can happen that in any case these y belong to Ui. Thus 
for these particular y's we have ip(y)\\y — Pj\\ < ip(y)\\y — Pi\\. Moreover, since 
\Ui}i=i,..., m are not the Voronoi tessellation associated to the k's, the last in- 
equality must be strict over some set of positive measure. Thus we have that 
JC m (pi, . . . ,p m ,Vi, . . . , V m ) < IC m {pi, . . . ,p m ,Ui, ■ ■ ■ Mm), and therefore K, m is 
minimized, keeping fixed the piS exactly when the subset Vi's are chosen to be 
the Voronoi regions associated with the point pi's. ■ 

By the previous proposition and by the fact that critical points of the uncon- 
strained functional KL m are also critical points of the constrained functional Ti. m , 
we have that the MVT are always critical points for the functional TL m , and in 
particular T is either a saddle point or a local minimum for the functional TL m . 

Before we conclude, we need one last intermediate result. 

Proposition 15 Each agent will be able to return to its reference point before 
the generation of a new service request infinitely often with high probability as 

Proof: Let t\ be such that Bi{t\) ^ 0, for all i € {1, . . . ,m}. Such time 
exists, almost surely, because of the fairness of the proposed policies. At time 
t\ all agents will be within Q. Let n\ = card(2?(ii)) be the total number of 
outstanding targets at time t\. An upper bound on the time needed to visit all 
targets in T>(t\) is ni(diam(Q). When there are no outstanding targets, agents 
move to their reference points, reaching them in at most diam(Q) units of time. 

The time needed to service the initial targets and go to the reference config- 
uration is hence bounded by ti a \ < t\ + (ni + l)diam(Q). The probability that 
at the end of this initial phase the number of targets is reduced to zero is 

P [n(ttai) = 0] = exp(-A(i ini - h)) > exp(-A(n + l)diam(Q)), 

that is, P [n(tj n i) = 0] — > 1~ as A — > + . As a consequence, after an initial 
transient, all targets will be generated with all agents waiting at their reference 
points, and an empty demand queue, with high probability as A — > + . ■ 
We can now conclude with following: 

Theorem 16 (Efficiency) The system time provided by the no- communication 
policy 7r nc and by the sensor-based policy 7r s b converges to a critical point (either 
a saddle point or a local minimum) with high probability as A — * 0. 



Proof: Combining results in Propositions [§] and ^| we conclude that the 
reference points of all agents that visit an unbounded number of targets converge 
to a MVT, almost surely — provided agents can return to the reference point 
between visiting targets. Moreover, the fairness result in Proposition 1131 shows 
that in fact all agents do visit an unbounded number of targets almost surely; 
as a consequence, Proposition ^] the limit configuration is indeed a critical 
point for the system time. Since agents return infinitely often to their reference 
positions with high probability as A — > 0, the claim is proven. ■ 
Thus we have proved that the suggested algorithm enables the agents to 
realize a coordinated task, such that "minimizing" the cost function without 
explicit communication, or with mutual position knowledge only. Let us under- 
line that, in general, the achieved critical point strictly depends on the initial 
positions of the agents inside the environment Q. It is known that the function 
7i m admits (not unique, in general) global minima, but the problem to find them 
is NP-hard. 

Remark 17 We can not exclude that the algorithm so designed will converge 
indeed to a saddle point instead of a local minimum. This is due to the fact that 
the algorithm provides a sort of implementation of the steepest descent method, 
where, unfortunately we are not following the steepest direction of the gradient of 
the function TL m , but just the gradient with respect to one of the variables. For a 
broader point of view of steepest descent in this framework see for instance [26]. 

On the other hand, since the algorithm is based on a sequence of targets 
and at each phase we are trying to minimize a different cost function, it can be 
proved that the critical points reached by this algorithm are no worse than the 
critical points reached knowing a priori the distribution (p. This is a remark- 
able result proved in a different context in [27], where it is also presented an 
example in which the use of a sample sequence provides a better result (with 
probability one) than the a priori knowledge of ip. In that specific example the 
algorithm with the sample sequence does converge to a global minimum, while 
the algorithm based on the a priori knowledge of the distribution ip gets stuck 
in a saddle point. 

4.6 A comparison with algorithms for vector quantization 
and centroidal Voronoi tessellations 

The use of Voronoi tessellations is ubiquitous in many fields of science, rang- 
ing from operative research, animal ethology (territorial behaviour of animals), 
computer science (design of algorithms), to numerical analysis (construction of 
adaptive grids for PDEs and general quadrature rules), and algebraic geom- 
etry (moduli spaces of abelian varieties). For a detailed account of possible 
applications see for instance the book [26]. In the available literature, most of 
the analysis is devoted to applications of centroidal Voronoi tessellations, i.e., 
Voronoi tessellation such that 




m}. 



A popular algorithm due to Lloyd [18] is based on the iterative computa- 
tion of centroidal Voronoi tessellations. The algorithm can be summarized as 
follows. Pick m generator points, and consider a large number n of samples 
from a certain distribution. At each step of the algorithm generators arc moved 
towards the centroid of the samples inside their respective Voronoi region. The 
algorithm terminates when each generator is within a given tolerance from the 
centroid of samples in its region, thus obtaining a centroidal Voronoi tessellation 
weighted by the sample distribution. There is also a continuous version of the 
algorithm, which requires the a priori knowledge of a spatial density function, 
and computation of the gradient of the polar moments of the Voronoi regions 
with respect to the positions of the generators. An application to coverage 
problems in robotics and sensor networks of Lloyd's algorithm is available in 
[13]. 

The algorithms we introduced in this paper are more closely related to an 
algorithm due to MacQueen [19], originally designed as a simple online adap- 
tive algorithm to solve clustering problems, and later used as the method of 
choice in several vector quantization problems where little information about 
the underlying geometric structure is available. MacQuccn's algorithm can be 
summarized as follows. Pick m generator points. Then iteratively sample points 
according to the probability density function (p. Assign the sampled point to 
the nearest generator, and update the latter by moving it in the direction of the 
sample. In other words, let qj be te j-th sample, let be the index of the 

nearest generator, and let c = (ci, . . . , c m ) be a counter vector, initialized to a 
vector of ones. The update rule takes the form 

Pi'U) <~ 



i'U) 



1 



Ci*{j) <- <V(j) + 1- 

The process is iterated until some termination criterion is reached. Compared 
to Lloyd's algorithm, MacQuccn's algorithm has the advantage to be a learning 
adaptive algorithm, not requiring the a priori knowledge of the distribution 
of the objects, but rather allowing the online generation of samples. It can be 
recognized that the update rule in MacQueen's algorithm corresponds to moving 
the generator points to the centroids of the samples assigned to them. 

The algorithm we propose is very similar in spirit to MacQueen's algorithm, 
however, there is a remarkable difference. MacQueen's algorithm deals with 
centroidal Voronoi tessellations, thus with the computation of fc-means. Our 
algorithm instead is based on MVT, and on the computation of /e-medians. In 
general, very little is known for Voronoi diagrams generated using simply the 
Euclidean distance instead of the square of the Euclidean distance. For instance, 
the medians of a sequence of points can exhibit a quite peculiar behavior if 
compared to the one of the means. Consider the following example. Given 
a sequence of points {ftjigN in & compact set K C K 2 , we can construct the 



induced sequence of means: 



m N := 




i=i 



and analogously the induce sequence of FT points we considered in the previous 
sections. Call FTm the FT point corresponding to the first ./V points of the 
sequence {ftjieN- We want to point out that induced sequence {rrij} and {FTj} 
have a very different behaviour. Indeed, the induced sequence of means will 
always converge as long as the points qjs belong to a compact set. To see this, 
just observe that if dian^if) < L, then ||mj|| < L. Moreover, it is immediate to 
see that ||mAr + i—mAr|| < Then one can conclude using the same argument of 
Theorem {JjJ. On the other hand, one can construct a special sequence of points 
qjS in a compact set K for which the induced sequence of FT points does not 
converge. This is essentially due to the fact that while the contribution of each 
single point in moving the position of the mean decreases as j increases, this 
could not happen in the case of the median. To give a simple example, start with 
the following configuration of points in M 2 : qi = (1, 0), q2 = (— 1, 0), qs = (0, 1) 
and q4 = (0,-1). Then the sequence of points qjS continue in the following 
way: q^ = (0, 1) if k > 4, and k odd, q\. = (0, — 1) if k > 4 and k is even. Using 
the characterization of FT points, it is immediate to see that FT^ = (0,0) for 
k > 4 and k even, while FT^ = (0, tan(7r/6)) for k > 4 and k odd, so the 
induced sequence can not converge. This phenomenon can not happen to the 
sequence of means, which is instead always convergent. Therefore, it should 
be clear that the use of MVT instead of centroidal Voronoi tessellations makes 
much more difficult to deal with the technical aspects of the algorithm such as 
its convergence. 



5 A game-theoretic point of view 

In this section we provide an analysis of the proposed algorithm from the 
point of view of game theory. In particular, we frame our presentation on the 
works [7] , [28] , in which the point of view of game theory has been introduced 
in the study of cooperative control and strategic coordination of decentralized 
networks of multi-agents systems. In this section we prove that our algorithm 
provides a pure Nash equilibrium in a multi-player game where each agent is 
interested in maximizing its own utility. On the other hand, our multi-player 
game formulation is much simpler than the usual framework considered in the 
literature, since there will be no negotiation mechanism among the agents. De- 
spite this fact it turns out that in this example, just trying to maximize their 
own utility function the agents will indeed maximize a different global utility 
function. 

In this section we view the agents as rational autonomous decision makers 
trying to maximize their own utility function. The utility function of agent i, 
denoted by Ui is simply the expected number of service requests handled by agent 



i within a certain time horizon, bounded or unbounded, where i = 1, . . . ,m. 
We assume that the stochastic process for generating targets is the one already 
described in the previous sections. It is obvious that any of the utility function 
Ui is a function of the policy vector 7r :— {7Ti, . . . ,7r m } consisting of all the 
policies followed by each agent. In general the space of all policies II is just 
an uncountable set containing all conceivable policies chosen by an agent and 
it does not have any other structure. Thus we assume that the policy space 
Hi of agent i is equal to a fixed policy space II which is independent on i, this 
for any i = l,...,m. Therefore the policy vector tt £ H m . Denoting with 
7r_j : the policy specification of all the agents, except agent j, i.e. 7r_j := 
(7Ti, . . . , TTj-i,TTj + i, . . . , 7r m ) we may write policy vector tt as (irj, TT-j). Using 
this notation we can formulate the following definition adapted by [7]: 

Definition 18 A policy vector tt* is called a pure Nash equilibrium if for all 
j = l,...,m: 

U j (-K*,7T*_ j ) = m a xU j (7r j ,n*_ j ). (21) 

Moreover, a policy vector tt is called efficient if there is no other policy vector 
that yields higher utilities to all agents. 

Under the target generation assumptions followed so far, we have the follow- 
ing: 

Proposition 19 Let us call ifj the policy assignment for agent j corresponding 

to our algorithm. Then the policy vector tt :~ i 7 ^?"} j=i m is an efficient pure 

Nash equilibrium for the given agents utilities. 

Proof: It is immediate to see that policy vector tt satisfies equation (I21|) . 
and thus is a pure Nash equilibrium. It is also clear that cannot be any other 
policy assignment which yields strictly higher utilities to all agents, simply be- 
cause when there is a new outstanding service request all agents move directly 
toward that location, trying to satisfy it as if there were no other agent in the 
environment. ■ 
On the other hand, observe that we do not claim in the previous proposition 
that our algorithm provides the unique efficient pure Nash equilibrium for the 
given set of utilities function. For instance to find other efficient pure Nash 
equilibria it is sufficient to modify the algorithm during the initial phases, when 
the FT points are not uniquely determined. These modifications produce dif- 
ferent policy vectors which are anyway all efficient pure Nash equilibria, as it is 
immediate to see. 

Our game-theoretic formulation of the given algorithm belongs to a class of 
multi-player games called potential games. In a potential game the difference 
in the utility reached by any of the agents for two different policy choices, when 
the policies of the other agents are kept fixed, can be measured by a potential 
function that depends on the policy vector and not on the label of any agent. 
The fact that our game-theoretic formulation belongs to this class is obvious 
since the functional form of the utility function is the same for each agent, that 



is Uj = U for any agent j. So in this potential function we take U. 

The formal definition is as follows: 

Definition 20 A potential game is the set {U\, . . . , U m , ip} consisting of agent 
utilities U\(tt), . . . , U n (ir) and a potential function tjj : IT™ — ■> M, such that for 
every agent dj and for any policy assignments n'j, 7Tj" G Uj and -K-j G Ilfc^j Hfe.' 

Uj (tt'j , 7T—j ) - Uj (n- , TT-j IX— j ) - ip (TXj , n-j ) • 

An extension of this concept is provided by: 

Definition 21 An ordinal potential game is the set {U\, . . . , U m , ip} consisting 
of agent utilities Ui(n), . . . , U n (ir) and a potential function ip : II m — » M, such 
that for every agent aj and for any policy assignments 7Vj , 7Tj" G Ilj anc? 7r_j G 

Uj(-Kj,-n:-j) — Uj (tTj , 7r_ j ) > if and only ifipijtj, ft-j) — ipi^j , Tt-j) > 0. 

At this point, we are ready to introduce the global utility function for this 
multi-player game formulation. The global utility function for this game is 
given by U g (ir) = — T w , where T„ is the system time under policy vector tt 
An important aspect in the game-theoretic approach is to understand to which 
extent the utility functions of the individual players are compatible with the 
global utility function. To this aim the following definition has been introduced 
in [7]: 

Definition 22 The set of agents utilities {C^j(7r)}j=i,...,m *s aligned with the 
global utility U g (ir) iff the set {U\, . . . , U m , U g } forms an ordinal potential game, 
with potential function given by U g . 

It is immediate to prove the following: 

Proposition 23 The given agents utilities are aligned with the global utility 
function, in this game-theoretic formulation of the algorithm. 

Proof: Let us focus on one agent, say agent j. If its utility function in- 
creases, it is able to service a bigger number of targets in the given time hori- 
zon. It can happen that some other agent will have a corresponding decrease 
in their utility function, and this happens exactly when the agent j due to a 
policy change will service some targets more rapidly. This in turn, will increase 
anyway U g . ■ 
In general, it is true that alignment does not prevent pure Nash equilibria 
from being suboptimal from the point of view of the global utility. Moreover, 
even efficient pure Nash equilibria (i.e. pure Nash equilibria which yield the 
highest utility to all agents) can be suboptimal from the perspective of the 
global utility function. Such a phenomenon is indeed what happens in our 
construction. 



Proposition 24 The policy vector % corresponding to the policies realized by 
our algorithm is an efficient pure Nash eguilibrium which is possibly suboptimal 
from the point of view of global utility. 

Proof: We already know that tt is an efficient pure Nash equilibrium and 
from the analysis developed in Section Q we know that it yields a critical point 
for the system time T n . Thus it corresponds to a critical point for U g (ir) 7 either 
a local maximum or a saddle point. On the other hand, as noted for TV, U g 
may have in general several local maxima and our algorithm is not guaranteed 
to converge to a global maximum. ■ 



6 Numerical results 

In this section, we present simulation results showing the performance of the 
proposed policies for various scenarios. 

6.1 Uniform distribution, light load 

In the numerical experiments, we first consider m = 9, choose Q as a unit 
square, and set ip = 1 (i.e., we consider a spatially uniform target-generation 
process). This choice allows us to determine easily the optimal placement of 
reference points, at the centers of a tesselation of Q into nine equal squares, 
and compute analytically the optimal system time. In fact, it is known that the 
expected distance of a point q randomly sampled from a uniform distribution 
within a square of side L from the center of the square c is 

E[l| g -c||]=^ + 1 ° S ( 1 + ^L *Q.3826L. 
6 

The results for a small value of A, i.e., A = 0.5, are presented in Figure 
The average service time converges to a value that is very close to the theoretical 
limit computed above, taking L = 1/y/m = 1/3. In both cases, the reference 
points converge — albeit very slowly — to the generators of a MVT, while the 
average system time quickly approaches the optimal value. 

6.2 Non-uniform distribution, light load 

We also present in Figure El results of similar numerical experiments with a 
non-uniform distribution, namely an isotropic normal distribution centered at 
(0.25,0.25), with standard deviation equal to 0.25. 

6.3 Uniform distribution, dependency on the target gen- 
eration rate 

An interesting set of numerical experiments evaluates the performance of the 
proposed policies over a large range of values of the target generation rate A. 




Figure 5: Numerical simulation in the light-load case, for a uniform spatial 
distribution. Top left: the actual service times as a function of time, for the 
two policies, compared with the optimal system time. Top right: the initial 
configuration of the nine agents. Bottom left and right: paths followed by the 
reference points up to t = 10 4 (corresponding to approximately 5,000 targets), 
using the two policies. The locations of all targets visited by one of the agents 
are also shown. 

In Section 21 we proved the convergence of the system's behavior to an efficient 
steady state, with high probability as A —> 0, as confirmed by the simulations 
discussed above. For large values of A however, the assumption that vehicles are 
able to return to their reference point breaks down, and the convergence result 
is no longer valid. In figured we report results from numerical experiments on 
scenarios involving m = 3 agents, and values of A ranging from 1/2 to 32. In 
the figure, we also report the known (asymptotic) lower bounds on the system 
time (with 3 agents), as derived in [10], and the system time obtained with the 
proposed policies in a single-agent scenario. 

The performance of both proposed policies is close to optimal for small A, 
as expected. The sensor-based policy behaves well over a large range of target 
generation rates; in fact, the numerical results suggest that the policy provides 
a system time that is a constant-factor approximation of the optimum, by a 
factor of approximately 1.6. 

However, as A increases, the performance of the no-communication policy 
degrades significantly, almost approaching the performance of a single-vehicle 
system over an intermediate range of values of A. Our intuition in this phe- 




Figure 6: Numerical simulation in the light-load case, for a normal spatial 
distribution. Top left: the actual service times as a function of time, for the two 
policies. Top right: the initial configuration of the nine agents. Bottom left and 
right: paths followed by the reference points up to t = 10 4 (corresponding to 
approximately 5,000 targets), using the two policies. The locations of all targets 
visited by one of the agents are also shown. 

nomcnon is the following. As agents do not return to their own reference points 
between visiting successive targets, their efficiency decreases since they are no 
longer able to effectively separate regions of responsibility. In practice — unless 
they communicate and concentrate on their Voronoi region, as in the sensor- 
based policy — agents are likely to duplicate efforts as they pursue the same 
target, and effectively behave as a single- vehicle system. Interestingly, this effi- 
ciency loss seems to decrease for large A, and the numerical results suggest that 
the no-communication policy recovers a similar performance as the sensor-based 
policy in the heavy load limit. Unfortunately, we are not able at this time to 
provide a rigorous analysis of the proposed policies for general values of the 
target generation rate. 

7 Conclusions 

In this paper we considered two very simple strategies for multiple vehicle rout- 
ing in the presence of dynamically-generated targets, and analyzed their per- 
formance in light load conditions, i.e., when the target generation rate is very 
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Figure 7: System time provided by the policies proposed in this paper, as a 
function of the target generation rate A. The system is composed of three 
vehicles, and the target points are generated uniformly in the unit square. 

small. The strategics we addressed in this paper are based on minimal assump- 
tions on the ability of the agents to exchange information: in one case they do 
not explicitly communicate at all, and in the other case, agents are only aware 
of other agents' current location. A possibly unexpected and striking results 
of our analysis is the following: the collective performance of the agents using 
such minimal or no-communication strategies is (locally) optimal, and is in fact 
as good as that achieved by the best known decentralized strategies. Moreover, 
the proposed strategies do not rely on the knowledge of the target generation 
process, and makes minimal assumptions on the target spatial distribution; in 
fact, the convexity and boundedness assumptions on the support of the spatial 
distribution can be relaxed, as long as path connectivity of the support, and 
absolute continuity of the distribution are ensured. Also, the distribution needs 
not be constant: Indeed, the algorithm will provide a good approximation to a 
local optimum for the cost function as long as the characteristic time it takes for 
the target generation process to vary significantly is much greater than the re- 
laxation time of the algorithm. In summary, the proposed strategics can be seen 
as a learning mechanism in which the agents learn the target generation process, 
and the ensuing target spatial distribution, and adapt their own behavior to it. 

The proposed strategies are very simple to implement, as they only require 
storage of the coordinates of points visited in the past and simple algebraic 
calculations; the "sensor-based" strategy also require a device to estimate the 
position of other agents. Simple implementation and the absence of active com- 



munication makes the proposed strategies attractive, for example, in embedded 
systems and stealthy applications. The game-theoretic interpretation of our re- 
sults also provides some insight into how territorial, globally optimal behavior 
can arise in a population of selfish but rational individuals even without explicit 
mechanisms for territory marking and defense. 

While we were able to prove that the proposed strategies perform efficiently 
for small values of the target generation rate, little is known about their per- 
formance in other regimes. In particular, we have shown numerical evidence 
that suggests that the first strategy we introduced, requiring no communica- 
tion, performs poorly when targets are generated very frequently, whereas the 
performance of the sensor-based strategy is in fact comparable to that of the 
best known strategies for the heavy load case. 

Extensions of this work will include the analysis and design of efficient strate- 
gies for general values of the target generation rate, for different vehicle dynamics 
models (e.g., including differential constraints on the motion of the agents), and 
heterogeneous systems in which both service requests and agents can belong to 
several different classes with different characteristics and abilities. 
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